Text Identification
Note: This process requires the purchase of a Quick Fields add-on.
The Text Identification process sorts documents into classes based on whether page text matches a specified pattern or patterns. It can only be used in the Identification stage.
Note: To generate text for use in Text Identification, configure OmniPage OCR or Text Extraction in Pre-Classification Processing.
To configure Text Identification
- In the Session Configuration Pane, select the Identification node.
- In the Tasks Pane, select Text Identification.
- Click Add Pattern.... The New Text Pattern dialog box will appear.
- Specify a name for the pattern.
- Specify a pattern to match. For a list of common expressions, click the pattern button. For more information, see the Regular Expression Reference.
- To test the pattern, click the Test... button. Specify a value that you expect to fit the pattern. Click OK to see if the expected value is returned. If not, adjust the pattern and test again.
- To make the pattern match case sensitive, select Match Case. To make it case insensitive, clear it.
- Click OK.
- Optional: To add another pattern, click Add Pattern... again. Text will need to match all patterns in the process for the document to be identified as belonging to the document class.
- Optional: To preview how this enhancement will affect scanned images and OCRed or extracted text, test processes. For the best results, add a custom sample image before testing. Adjust and test until you are satisfied with the results.